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Hierarchical organization of schooling in all nations insures that international 
large-scale assessment data are multilevel where students are nested within 
schools and schools are nested within nations. Longitudinal follow-up of these 
students adds an additional level. Hierarchical or multilevel models are appro- 
priate to analyze such data (Raudenbush and Bryk 2002; Goldstein 2003). A 
ubiquitous problem, however, is that explanatory as well as outcome variables 
may be subject to missingness at any of the levels, posing the data analyst with 
a challenge. 

This chapter explains how to efficiently analyze a two-level hierarchical linear 
model given incompletely observed data where students at level 1 are nested 
within schools at level 2. This social setting may also apply to occasions nested 
within individuals, students nested within nations, and schools nested within 
nations. The efficient missing data method we use in this chapter aims to analyze 


all available data (Shin and Raudenbush 2007). The “all available data” include 


children with item as well as unit nonresponse as they belong to a school and a 
nation having observed data and thus add information to strengthen inferences 
at higher levels (Shin and Raudenbush 2011; Shin 2012). 

Section 1| clarifies the assumptions we make about missing data for efficient 
analysis of multilevel incomplete data. Section 2 summarizes currently avail- 
able methods for analysis of multilevel incomplete data. Section 3 introduces 
the missing data method we use in this chapter and explains how it efficiently 
estimates a hierarchical linear model given incomplete data. Section 4 illus- 
trates efficient analysis of a hierarchical linear model given the incompletely 
observed US data from the Programme for International Student Assessment 
(PISA, OECD 2007). Section 5 illustrates an analysis strategy with plausible 
values given the PISA data where each missing value of the outcome variable is 
filled in or imputed with five plausible values, but predictors may be subject to 
missingness. Section 6 discusses the extensions and limitations of the efficient 


missing data method. 


1 Assumptions about Missing Data 


In this chapter, we consider analysis of incompletely observed two-level data 
with the most common missing data pattern in education, a general missing 
pattern. That is, the missing data method to be introduced in section 3 ef- 
ficiently handles explanatory as well as outcome variables that are subject to 
missingness with any missing patterns at a single level or multiple levels. Con- 
sequently, we do not distinguish different types of missing patterns produced by 
item or unit nonresponse. 

Nearly all educational data sets are multilevel and have missing data. Un- 
til quite recently, researchers facing multilevel incomplete data analysis have 


dropped cases with missing values. The complete-case analysis is more prob- 


lematic in multilevel analysis than it is in single-level analysis. A missing na- 
tional characteristic, for example, implies deletion of not only the nation but 
all nested schools and students within the nation. Such analysis lowers sample 
sizes at multiple levels to produce inefficient inferences. Consequently, the stan- 
dard errors of parameter estimates will be larger than they should, resulting 
in conservative hypothesis tests and excessively wide confidence intervals. In 
addition, the missing data patterns may be associated with the deleted data 
to produce biased inferences (Little and Rubin 2002). Unbiased analysis is 
achieved when missing data are a random sample of the complete data, so that 
the missing data patterns are not associated with complete data, i.e., when data 
are missing completely at random (MCAR, Rubin 1976). However, data MCAR 
is seldom a reasonable assumption. 

Missing values may be imputed or filled in by ad hoc imputation meth- 
ods such as a sample-mean substitution and a regression model-based predic- 
tion. The substituted or predicted values, however, under-represent the true 
uncertainty in the missing values to produce underestimated standard errors of 
parameter estimates. Consequently, the resulting hypothesis tests will be too 
liberal and the confidence intervals too narrow. Missing values may also be 
filled in by other imputation methods such as a last-observation-carry-forward 
method (Krueger 1999) and hot deck imputation (Little and Rubin 2002) where 
missing values are replaced with observed values of similar units. These single 
imputation methods take each imputed value as if it is the true value for sub- 
sequent complete-data analysis. The estimation, however, does not take into 
account uncertainty due to missing data to yield understated standard errors 
of parameter estimates. In general, these ad hoc imputation methods are not 
recommended unless missing data consist of a small fraction of complete data. 


In this chapter, we shall employ two comparatively mild assumptions in 


many applications that data are missing at random (MAR) and that the pa- 
rameters, 6, of the desired hierarchical model are distinct from the nuisance pa- 
rameters, ¢, of the missing data generating mechanism or the model for missing 
patterns (Rubin 1976). The MAR assumption means that missing data patterns 
are conditionally independent of missing data given observed data. That is, the 
association between missing data patterns and complete data is explained by 
observed data. When variables subject to missingness are highly correlated, for 
example, the observed data are likely to explain the association between missing 
data patterns and complete data to make the MAR assumption plausible (Shin 
and Raudenbush 2011; Shin 2012). The MAR assumption requires that we 
analyze all observed data for efficient analysis. The distinct parameter assump- 
tion is reasonable if there is little reason to believe that knowing the nuisance 
parameters ¢ provides extra information on the desired parameters 6 (Schafer 
1997). In regression analysis for the effect of socioeconomic status on a math 
achievement outcome, for example, a student may not take the exam because 
she is sick or because she moves to a different school due to relocation of her 
family. It is not reasonable to believe that knowing such a mechanism would 
provide more information about the desired effect. In that case, the distinct 
parameter assumption is reasonable. On the other hand, if low performers are 
more likely to miss the exam than high performers such that knowing ¢ of the 
missing data generating mechanism is informative about the desired effect, then 
the distinct parameter assumption is not reasonable. Data missing under these 
two assumptions are called ignorable (Little and Rubin 2002). The ignorable 
missing data assumption is much weaker than the MCAR assumption (Rubin 
1976; Schafer 1997; Little and Rubin 2002). Note that the MCAR implies the 
distinct parameter assumption. 


Missing data neither MAR nor MCAR are said to be not missing at random 


(NMAR, Rubin 1976; Little and Rubin 2002). Under this assumption, missing 
patterns are associated with observed as well as missing data. A longitudinal 
study, for example, produces informative dropouts where the dropout patterns 
are associated with unobserved as well as observed outcomes (Diggle and Ken- 
ward 1994; Little 1995; Muthén et al. 2011). Consequently, both 6 and ¢ have to 
be estimated from the joint distribution of complete data and missing patterns. 
This amounts to estimating, in addition to the desired hierarchical model, the 
model for missing patterns. Because the joint model involves missing data, the 
model assumptions yield parameters that are not uniquely identifiable, or pa- 
rameter estimates that are not supported by observed data (Little 2009). Such 
a parameter may be constrained for identification or assumed to take a value in 
estimation. In general, little evidence exists in observed data to support such 
a parameter estimate. Consequently, sensitivity analysis should follow estima- 
tion of the joint distribution over the range of plausible values of the parameter 
(Little 1995, 2009). Therefore, analysis given data NMAR is more challenging 
than that given data MAR or MCAR. 

In this chapter, we employ the ignorable missing data assumption that is 
quite plausible in many applications (Schafer 1997; Little and Rubin 2002). 
It is also the weakest condition under which we produce valid inferences by 
analyzing the desired hierarchical linear model only, i.e., by ignoring the missing 
data generating mechanism (Rubin 1976). The next section reviews currently 


available methods for analysis of ignorable multilevel missing data. 


2 Missing Data Methods 


A wide array of methods exist for efficient analysis of single-level ignorable miss- 
ing data (Rubin 1976, 1987, 1996; Dempster et al. 1977; Meng 1994; Schafer 


1997, 2003; Little and Rubin 2002). In particular, model-based multiple imputa- 


tion (Rubin 1987, 1996) is now routinely available based on widely used software 
packages such as NORM (Schafer 1999) and SAS (PROC MI, Y.C. Yuan 2000). 
These single-level methods, however, cannot be applied validly to hierarchical 
missing data and their extension to multilevel data entails challenges (Demp- 
ster et al. 1981; Schafer and Yucel 2002; Goldstein and Browne 2002, 2005; 
Yucel 2008; Shin and Raudenbush 2007, 2010, 2011). If methods developed 
for the multiple imputation of single-level data are applied to multilevel data, 
the variance-covariance structure of the imputed data sets will not accurately 
represent the multilevel educational processes that generated the data, nor will 
the structural relations at each level be captured correctly. When multilevel 
data are analyzed by a single-level method or under the misspecified number of 
levels, the resulting inferences may be considerably biased or inefficient (Shin 
2003; Shin and Raudenbush 2011; Van Buuren 2011). 

Current widely available methods for efficient analysis of ignorable multi- 
level missing data are quite limited. A two-level multivariate hierarchical linear 
model, where level-1 outcomes are subject to missingness given completely ob- 
served covariates, may be efficiently estimated via software packages such as 
Mplus (Muthen and Muthen 2010) and MLwiN (Rasbash et. al. 2009; Browne 
2012). With a univariate outcome in the model, this approach amounts to the 
complete-case analysis. When outcomes and covariates have missing values in 
the hierarchical model, however, a joint distribution of the variables subject to 
missingness has to be formulated and estimated to efficiently handle the missing 
data; and given the estimated distribution, multiple imputation of the complete 
data may be generated for subsequent analysis of the desired hierarchical model 
(Rubin 1987). Software packages such as WinBUGS (Spiegelhalter et al. 2000; 
Lunn et al. 2009), Mplus (Muthen and Muthen 2010; Asparouhov and Methen 
2010), MLwiN (Browne 2012) and R (Yucel 2008) provide Bayesian methods 


that enable formulation and estimation of such a joint distribution, and genera- 
tion of the multiple imputation for subsequent analysis of the hierarchical model. 
However, these software packages provide little guidance as to how to explic- 
itly formulate the joint distribution corresponding to the hierarchical model. 
For example, formulation of the joint distribution given a level-2 covariate sub- 
ject to missingness in the hierarchical model is neither automated nor clearly 
described by any of the software packages. In general, the transformation be- 
tween the joint distribution and the hierarchical model is nontrivial, involving 
an identification problem, and great care should be taken in formulation of the 
joint distribution that will identify the hierarchical model (Meng 1994; Shin and 
Raudenbush 2007). Otherwise, the estimation may produce biased point and 
uncertainty estimates of the hierarchical model or the formulated joint distribu- 
tion may be extremely high-dimensional to estimate well (Shin and Raudenbush 
2007, 2013). 

Multilevel ignorable missing data may be multiply imputed by univariate 
sequential regression models (Raghunathan et al. 2001), which is also known 
as multiple imputation by fully conditional specification (Van Buuren et al. 
2006). Software packages such as [VEware (Raghunathan et al. 2001) and Mul- 
tivariate Imputation by Chained Equations (MICE, van Buuren and Groothuis- 
Oudshoorn 2011) use a Bayesian method to produce multiple imputation. This 
approach specifies a univariate regression model for each variable subject to 
missingness conditional on all other variables and generates multiple imputa- 
tion based on the fitted model. While flexible in dealing with a mixture of 
continuous and discrete variables subject to missingness, the chained univariate 
models may not be compatible with a joint model (Horton and Kleinman 2007; 
van Buuren and Groothuis-Oudshoorn 2011). The implied joint model by the 


series of univariate regression models may not exist (Rubin 2003; Van Buuren 


et al. 2006). This approach has not been extended to outcomes and covariates 
subject to missingness at multiple levels of a hierarchical model (Van Buuren 
2011). 

The next section introduces an efficient missing data method via multiple 
imputation and its software for unbiased and efficient estimation of a two-level 
hierarchical linear model given ignorable missing data. A key feature is that 
the data analyst need only know the desired hierarchical model. This approach 
removes or substantially reduces the burden of the incomplete data analysis from 
the data analyst as intended by the method of multiple imputation (Rubin 1987, 
1996; Meng 1994). Consequently, with the software in hand, the incomplete 
hierarchical data analysis will introduce little more challenge than the complete- 


data counterpart to the data analyst. 


3 Efficient Handling of Missing Data 


This section explains how to efficiently estimate a two-level hierarchical linear 
model (HLM2) given incomplete data according to the missing data method 
of Shin and Raudenbush (2007). The method employs a six-step analysis pro- 
cedure to: (1) specify a desired hierarchical linear model given incompletely 
observed hierarchical data; (2) reparametrize as the joint distribution of vari- 
ables, including the outcome, that are subject to missingness conditional on all 
of the covariates that are completely observed under multivariate normality; (3) 
efficiently estimate the joint distribution using maximum likelihood (ML); (4) 
generate multiple imputation of complete data based on the ML estimates of 
the joint model; (5) analyze the desired hierarchical model by complete-data 
analysis given the multiple imputation; and finally (6) combine the multiple 
hierarchical model estimates (Rubin 1987). These steps have been implemented 


in a software package HLM 7 (Raudenbush et al. 2011). Given the hierarchical 


linear model that a data analyst specifies at the first step, HIM 7 automates 
the rest of the analysis procedure to produce efficient analysis of the hierarchical 
model. Consequently, the data analyst need only know her desired hierarchical 
linear model which is no different from the complete-data analysis. 

In this section, we introduce two comparatively simple examples of hierarchi- 
cal linear models with incomplete data to describe the problem that researchers 
confront in the conventional incomplete data analysis and how the missing data 
method resolves the problem by enabling efficient analysis via multiple impu- 
tation. One is a random-intercept model, and the other a random-coefficients 
model. Then, we present a reasonably general HLM2 given incomplete data, 
which may be efficiently analyzed by the method. Finally, we describe how to 
estimate the desired parameters and make inferences given multiple imputation 


(Rubin 1987). 


3.1 Random-Intercept Model 


To see how to handle multilevel incomplete data efficiently, it is instructive to 
consider a simple random-intercept model (Raudenbush and Bryk 2002). Let 
child 7 attend school j for i = 1,---,n; and 7 = 1,---,M. We consider a simple 


child-level or level-1 model 


Vig = Bog + Big Xig + iy (1) 


where Yj; is a univariate outcome variable, 80; is the level-1 intercept, (1; is the 
effect of a level-1 covariate X;; and child-specific random effect €;; is normally 
distributed with mean zero and variance 0°, i.e., €;; ~ N(0,07). Note that the 
model (1) is single-level within school j. Each coefficient in the level-1 model (1) 


becomes an outcome variable that may vary across schools in the school-level 


or level-2 model. We consider level-2 models 


I 


Boj Yoo + ¥o1Z; + Uo;, (2) 


Pry = 710 


where Yoo, Yor and 710 are level-2 coefficients, Z; is a level-2 covariate, and 
school-specific random effect uo; ~ N(0,7) is independent of child-specific €;;. 
By replacing $0; and 61; in the level-1 model (1) with yoo + yo1Z; + uo; and 10 
on the right-hand side of the level-2 models, respectively, we obtain a random- 


intercept model or HLM2 


Yiy = Yoo t+ 014; + N0Xij + Uoj + &;- (3) 


With data completely observed, this model may be analyzed by standard mul- 
tilevel software such as SAS, HEM 7 and MLwiN (Rasbash et. al. 2009). 
Difficulties arise given incompletely observed data. We consider (Y;;, Xi;, Z;) 
all subject to missingness with a general missing pattern in the desired model 
(3). Missing data may occur under seven different patterns for child i attending 
school j: One, two or all three values of (Y;;, X;;, Z;) may be missing. In general, 
p variables subject to missingness may produce up to 2? — 1 different missing 
patterns. Complete-case analysis drops children or observations that belong to 
any one of the missing patterns. It also deletes school 7 with missing Z; which 
entails deletion of all students attending the school. The resulting inferences 
will be inefficient and subject to bias. Ad hoc single-imputation methods fill in 
missing values for subsequent complete-data analysis. The imputed data under- 
represent uncertainty due to missing data in estimation. In general, hypothesis 
tests will be liberal, rejecting the null hypothesis too often. These methods are 


not recommended unless children and schools with missing values consist of a 


10 


small fraction of all children and schools. 

Efficient analysis of the model (3) has to analyze all available data. That is, 
rather than dropping observations that belong to any one of the seven missing 
patterns in the complete-case analysis, we drop child z in school 7 if and only 
if she belongs to one missing pattern: all three values of (Y;;, Xi;, Z;) missing. 
If one of (Yi;,Xi;,Z;) is missing for the child, the other two values available 
are analyzed; and if two values out of (Yi;, Xi;,Z;) are missing, the one value 
observed is analyzed. Consequently, children with unit non-response are also 
analyzed as long as they attend schools having observed Z; to strengthen in- 
ferences at school level. Consider, for example, school j having a single child 


sampled (n; = 1) who misses both Y;; and X;;, but school j has Z; observed. 


jo 
Note that school 7 is dropped if and only if all school mates miss both Y;; and 
Xj; and the school misses Z;. 


Efficient analysis of the HLM2 (3) using all available data may be formalized 


in the joint distribution of (Y;;,.Xi;, Z;) subject to missingness 


| Yij ] ee ll by | [ae ] 
Xiz = C245 (4) 
Zj 
for the means (a1, Q2,a3) of (Yi;,Xi;,Z;), and school-specific random effects 
bij 0 Wi Vi2 13 
bo; | ~ N O}5] dre oo gos of (Yi;,Xij;,2;) independent of 
b3; 0 V13 23 33 
; e143 0 Tu O12 
child-specific random effects ~N ; of (Yi;, Xi;) 
245 0 012 022 
where #1 = var(b1;), Yi2 = cou(b1;, b2;), Yi3 = cov(b1;, 635), Y22 = var(ba;), P23 = 


cov(b2;, 635), W33 = var(b35), 011 — var (€1;5), 712 as cov (E145, €24;) and 022 = 
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var(e2;;). Note that Z; stays the same among schoolmates with no level-1 ran- 
dom effect. The missing data method for the desired hierarchical model (3) 
via efficient estimation of the joint model (4) produces efficient analysis of the 
hierarchical model as the conditional distribution of Y;; given X;; and Z; (Shin 
and Raudenbush 2007). 

To explicitly show how to analyze all available data, we first consider children 
with a single value missing. If a single value Y;; is missing for child 7 attending 


school j, the two observed values (X;;,Z;) of the child enable estimation of 


Xi; _ | % zp ba; n C247 Pe a2 . wo2 +022 wW23 


Zj a3 bs; 0 a3 23 W33 


which adds information to estimation for (a2, a3, W22, W23, W33, 722); if a single 
value Xj; is missing, the other two observed values enable estimation of a bivari- 
ate model (Y;;, Z;) to strengthen inferences involving (a1, a3, 11, Y13, U33, 711); 
and with Z; missing, child i with observed (Y;;, X;;) adds information to esti- 
mation for (a1, a2, W11, W12, W22, 011, 12, 722) in a bivariate model (Y;;, Xi;). 
Let us now consider children with two values missing. Child 7 missing 
(Yi;, Xi;) adds information to estimation of a univariate model Z; ~ N(a3, W’33) 
at school level. Take, for example, school j having a single child (nj = 1) with 
unit nonresponse, but the school has observed Z;. If the child misses (Y;;, Z;), 
she contributes to estimation of Xj; ~ N(a2,w22 + 022); and if she misses 
(Xi;,Z;), she adds information to estimation of Yj; ~ N(ai,W11 + 011). 
Consequently, all partially observed cases contribute to estimation of the 
joint model (4), and thus of the desired model (3). The only case when school 
j is dropped from analysis happens if and only if school 7 misses Z; and all n; 


schoolmates miss both Y;; and X;;. Therefore, the method analyzes all available 


data to achieve efficient analysis of the desired hierarchical model (3). 
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Completely Observed Covariates. We now consider completely ob- 
served covariates U;; and W; at levels 1 and 2, respectively, in addition to 


(Yi;,Xij, Z;) subject to missingness. The desired level-1 model is 


Yiy = Bog + Bry Xiz + Boj Vig + €iy (6) 


where {; is the effect of the level-1 covariate U;; and everything else is defined 


in the same way as that of the model (1). We consider level-2 models 


Boj = Yoo + yo1Z; + Yo2W; + uo;, (7) 
Bry = Y10; 
22; = 20 


where Yoo, Yo1, You, Yio and 29 are level-2 coefficients, Z; and W; are level-2 
covariates, and school-specific random effect up; ~ N(0,7) is independent of 


child-specific €;; ~ N(0,07). The desired random-intercept model or HLM2 is 


Yij = Yoo + 014; + Yo2W; + V10Xig + Y2oUig + Uog + ij: (8) 


To efficiently handle missing data, we formulate the joint distribution of 
(Yi;,Xij,Z;) subject to missingness conditional on (U;;,W;) completely ob- 
served. That is, we formulate the joint model as Equation (4) where a1, a2 
and ag are replaced with aig + a11;Wj + a12Ui;, 29 + 21W; + a22Ui; and 
a30 + a31W;, respectively, and every other component is the same as it appears 
in the model (4). Note that the level-2 covariate W,; has its effects on level-1 
as well as level-2 responses (Y;;,Xi;,Z;) while the level-1 covariate U;; affects 


level-1 responses (Y;;,X;;) only. The efficient handling of missing data for the 
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joint model (4) also applies here for the joint model corresponding to the HLM2 


(8). 


3.2. Random-Coefficients Model 


This section explains the strategy for efficient analysis of a random-coefficients 
model given incomplete data. We consider the level-1 model (6) where the 
intercept as well as the coefficient of U;; vary randomly across schools at level 


2. Thus, we consider level-2 models 


Boj = Yoo + yo1Z; + Yo2W; + uo;, (9) 
Pij = Yo; 
B23 = Yeo + U2; 


where school-specific random effects u; ~ N(0,7) are independent of child- 
Uj T00 ~=—T02 

specific €;; ~ N(0,07) for uj = ’ | andr = and everything 
U25 To2 722 


else is defined identically as the counterpart of the level-2 models (7). The 


desired random-coefficients model or HLM2 is 


Yiz = Yoo + 0145 + Yo2W; + V10-Xig + YaoUig + Uog + U2jUig + E45 (10) 


for (Yij,Xij,Z;) subject to missingness, and (U;;,W,;) completely observed. 
Conventional analysis of the HLM2 (10) confronts the same problems with the 
seven missing data patterns as does that of the random-intercept model (3). 
Efficient handling of missing data for the hierarchical model (3) also applies 
here for efficient analysis of the hierarchical model (10). We consider the joint 


distribution of (Y;;,Xi;,Z;) subject to missingness conditional on completely 
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observed U;; and W; as 


Yij O10 1 ay1W; ah a2U;; bo; + b1; Ui; ei 
Xij — Q20 Q21W; + a22U 4; + bo; + Ci (11) 
4j 239 + a31W; b35 0 


where (a10, @29, @30) are the intercepts, (11, @21, @31) are the effects of W; on 


(Vij, Xij, Z;), respectively, (a12, a22) are the effects of Ui; on (Yi;, Xi;), respec- 


bo; 0 Woo Wor Wo2 Wo3 
by; 0 
tively, and school-specific il a N ; TE ies Ns 
bo; 0 Wo2 Wie We. 23 
b35 0 Wos  W13 W23 33 
: : 2 C1ij 0 O11 9712 
are independent of child-specific ~N ; 
24; 0 12 022 


Covariates Subject to Missingness Having Random Effects. Effi- 
cient estimation of the random-coefficients model (10) requires that covariate 
U;; having random coefficient uz; be completely observed. Difficulty arises 
when U;,; is subject to missingness in the hierarchical model. The joint model 
for (Yi;,Xi;,Ui;,Z;) subject to missingness has to be formulated for efficient 
handling of missing data while U;; needs to be given on the right hand side of 
the model for estimation of its random coefficient. Such a joint model cannot 
be expressed as a multivariate normal distribution so that the normal factoriza- 
tion of the joint model that leads to the hierarchical model as the conditional 
distribution of Y;; given covariates does not apply. Consequently, it is difficult 
to efficiently handle missing data in the hierarchical model via ML estimation of 
the multivariate normal joint model. We assume that covariates having random 


effects are completely observed, which is a limitation of the method. 
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3.3. General HLM2 


We now express a general HLM2 given incomplete data, which can be efficiently 
analyzed by the missing data method. The model is 


Yip = Xie t Zp y2 t+ UG a + Wi yw + Dijpuy + ey (12) 


where Yj; is a univariate outcome variable, X;,; and Z; are vectors of p; level-1 
and pz level-2 covariates subject to missingness having fixed effects 7, and yz, 
respectively, U;; and W; are vectors of p3 level-1 and py, level-2 covariates com- 
pletely observed having fixed effects 7, and y,,, respectively, and D,; is another 
vector of ps level-1 covariates completely observed having level-2 unit-specific 
random effects u; ~ N(0,7) independent of level-1 unit-specific random errors 


2. The desired parameters 


€ij ~ N(0,07) for a ps-by-ps matrix 7 and scalar o 
are 6 = (Yar Yer Yur Yws 7, 0”). 
The hierarchical models considered so far are special cases of the HLM2 (12). 


For example, the random-intercept model (3) is a special case of the HLM2 (12) 


where X;; and Z; are scalar, Uj; = 0, W; = Diz = 1, Ye = Yio, Yz = You, Yu = 9, 


Yw = Yoo, and uj; = uo;. Although the general HLM2 (12) is not required to 
represent an intercept, many applications do where the first elements of W; 
and D,;; are equal to one with the corresponding first elements of 7, and u; 
representing the mean intercept and the random deviation of the intercept from 
the mean, respectively. We require that D;; be completely observed. Note that 
U;; and D;; may share common covariates. For example, D;; may be a subset 
of covariates in U;;. 

The p variables (Y;;,Xi;, Z;) subject to missingness in the HLM2 (12) may 
produce up to 2? —1 different missing patterns for p = 1+p;+p2. Complete-case 
analysis drops children who belong to any one of the missing patterns. As the 


number of p variables subject to missingness increases, a number of children and 
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schools may have to be dropped from the analysis which results in inefficient 
inferences that may also be substantially biased. 

To describe a joint model that efficiently handles missing data in the HLM2 
(12), let A ® B be a Kronecker product that multiplies a b-by-b matrix B to 
each element of an a-by-a matrix A (Magus and Neudecker 1988), and let I, 
denote an n-by-n identity matrix for a positive integer n. For example, [3 ® B is 
a (3 x b)-by-(3 x b) diagonal matrix with diagonal submatrices (B, B, B) and all 
other elements equal to zero. Given the HLM2 (12) with missing data, we for- 


mulate the joint distribution of (Y;;,X;;,Z;) subject to missingness conditional 


on (U;;,W;, Di;) completely observed as 


Yay Uj? on Di,b1; C147 
Xig | = | Up. @ UR" )ae bo; + | e245 (13) 
4; (Ip, ® W7 Jas b3j 0 


for Ux, = [W7 Uj\]", vectors a1 and a2 of the fixed effects of Uj; on Yi; and 


Xjj, respectively, a vector ag of the fixed effects of W; on Z;, and school-specific 


bij 0 vi pi2 Ys 
random effects | by ga] ee O},] die dee Weg and child-specific 
b3; 0 Yi3 23 W33 
C1ij 0 11 O12 . 
random effects ~ N : independent. 
C243 0 O12 022 


The missing data method for the HLM2 (12) via efficient estimation of the 
joint model produces efficient analysis of the hierarchical model as the condi- 
tional distribution of Y;; given covariates (Shin and Raudenbush 2007). Note 
that given complete data, the HLM2 (12) is Y;; = Uy + WHvw + Diu; + ij 
equal to the joint model (13). The same strategy described above is used for 


efficient handling of missing data in (Y;;,.Xi;,Z;) where child 7 with at least a 
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single value observed contributes to estimation of the joint model. See Shin and 
Raudenbush (2007) for ML estimation of the joint model and multiple impu- 
tation given the ML estimates. Note that the variables subject to missingness, 
including the outcome, appear on the left-hand side given those completely ob- 
served on the right-hand side, which is the required form of the joint model (13) 


for efficient handling of missing data and efficient computation. 


3.4 Combining Estimates from Multiple Imputation 


Analysis of each of m imputed or completed data sets according to the desired 
HLM2 (12) produces m sets of 6 estimates and their associated variances. Fol- 
lowing Rubin (1987) and Schafer (1997), let Q be a parameter or a function 
of parameters in 0. Analysis of the tth completed data set produces the ML 
estimate (Q; and the associated variance U; for t = 1,2,---,m. The combined 


parameter estimate is simply the average 
GS 256 (14) 
eo re 
The variance associated with the combined estimate is 
= 1 
T = U+{1+—)]8B (15) 
m 


that consists of the average within-imputation variance 


a ape 
T = —SU, (16) 
Wet 
and the between-imputation variance 
B= —-_Y°G,-9) (17) 
m—1 é , 


The within-imputation variance U; reflects uncertainty in estimation of Q given 
the tth imputed data set (as if the missing values imputed were the true values) 
while the between-imputation variance B conveys uncertainty across the m esti- 
mates of Q due to missing data. No missing data implies B = 0 so that T = U. 
With the infinite number of imputations, the variance (15) associated with Q is 
T=U+B. The term (1+ 1/m) in Equation (15) adds extra uncertainty due 
to the finite number of m imputations. (Rubin 1987; Schafer 1997; Little and 
Rubin 2002) 

For inferences on a column vector Q of k elements, Equations (14) to (16) are 
of the same form, but Equation (17) becomes B = + pun (8s =QVO:-O)F 


where (Q; — Q)? denotes the vector (Q; — Q) transposed. 


3.5 Hypothesis Tests 


Let Q be a fixed effect or a linear function of fixed effects. We make inferences 


about Q based on 
~ t (18) 


where t, is the t distribution with the degrees of freedom 


ne oe (+t) (19) 
for 
_—— (i+=)5 (20) 
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estimating “the relative increase in variance due to nonresponse” (Rubin 1987). 


Consequently, a (1 — a) x 100% confidence interval for Q is 


Qt taaepvT (21) 


where t,,;~q/2 is the (1—a/2) x 100th percentile from t,. The p-value for testing 
a null hypothesis Hp : Q = Qo against an alternative hypothesis H, : Q 4 Qo 
at a significance level a is 
2x p(T > Ou) (22) 
VT 
where 7 is at, random variable (Rubin 1987; Schafer 1997). 
When the between-imputation variance B is low relative to U to yield a low 
r such that the degrees of freedom in Equation (19) are high, t,,;-./2 in the 
interval (21) and T in the p-value (22) can be replaced with the corresponding 
percentile 2;_,/2 and a random variable Z from the standard normal distribu- 
tion, respectively. When the relative increase r in variance due to missing data 
is high to yield low degrees of freedom, Equation (20) implies that increasing 
the number of m imputations decreases r to raise the degrees of freedom. HIM 


7 prints the degrees of freedom v in Equation (19) that can be solved for 


p= ( vjim—1)-1) (23) 


Equation (20) with U replaced with T estimates “the fraction of information 


about Q missing due to nonresponse” (Little and Rubin 2002) 


(Ge os 


Both r and s are positively associated with the between-imputation variance 
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B, but negatively associated with the number of m imputations. Frequently, 
researchers claim that a few imputations are enough to handle missing data rea- 
sonably well based on the fraction of missing information estimated by Equation 
(24). The size of the standard error of the Q estimate based on m imputations 
relative to the ideal one based on infinitely many imputations is approximately 
/1+8/m (Rubin 1987, p. 114; Schafer 1997, p. 107). When the fraction of 
missing information is high at s = 0.5, for example, the standard error VT of 
Q based on m = 5 imputations will be about /1+ 0.5/5 = 1.05 times as high 
as the counterpart based on infinitely many imputations. With s = 0.3, as few 
as m = 3 imputations will achieve about the same efficiency in terms of the 
relative size of standard errors. 

To compare model fits based on the likelihood ratio tests, let Q be k pa- 
rameters in 0 of HLM2 (12) or the full model. We want to test a null hy- 
pothesis Hp : Q = Qo versus an alternative one Hy, : Q # Qo. Let 09 be the 


parameters of the reduced model under Q = Qo. Consider, for example, a 2- 


Tol T00 TOL 
dimensional Q = in, 8= g5%e Ves o-) for 7 = 
T11 | Tol 711 
0 
and Qo = . Then, #9 has k = 2 parameters less than 6. The Q may also 


0 


include a combination of fixed effects, variances and covariances. 

Let 6* and 6% be the ML estimates of 0 and 0, respectively, given the tth 
completed data set for t = 1,---,m. The log likelihoods 1(6*) and 1(64) evaluated 
at 6! and 64,, respectively, yield the likelihood ratio statistic d, = 2[1(6") —1(4)]. 
The test statistic proposed by Li et al. (1991) is 


d/k —(m+1)(m—1)-!r1 
1l+r, 


D, (25) 
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for d = 7", d/m where 


n= (142) [AS (va-va))] 20) 


t=1 


is (1+1/m) times the sample variance of /d,,---,W/dm for vad = yn Va /m 
(Little and Rubin 2002). The r; estimates the average relative increase in 
variance due to missing data across the k parameters Q (Schafer 1997). Let 
Fy, denote a random variable from the F' distribution with k numerator and 


v, denominator degrees of freedom for 
vy = k-3/™(m—1)141/r1)?. (27) 
The p-value is given by 
P(Fxv, > D1). (28) 


With r, close to zero, 1; is large so that kD, has the chi-square distribution 
with k degrees of freedom for D, ~ d/k. Then the p-value (28) may also be 
obtained by 


P(x% > KD1) (29) 


for a chi-square random variable y% with k degrees of freedom. Given multi- 
ple imputation, the likelihood ratio statistic d, may be obtained from the tth 
completed data set to yield r;, D, and 1 for the hypothesis test. The test 
statistic D, yields an approximate range of p-values between one half and twice 
the computed p-value (Li et al. 1991). 

To obtain a more accurate p-value, let 9 = 07", 6¢/m and 0 = ye 8 /m 


so that d/, = 2[I,(0) — 1,(00)] is the likelihood ratio test statistic evaluated at 
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the average ML estimates @ and 0p given the tth completed data set. The test 


statistic proposed by Meng and Rubin (1992) is 


d' 
Ue aay oy 
for d’ = S77", d,/m where 
(m+ lag 


estimates the average relative increase in variance due to missing data across 


the k parameters Q (Schafer 1997). The p-value is given by 


P(F ev. > Dz) (32) 


where the denominator degrees of freedom is 


7 4+(u—4)[1+ (1—2/u)/ro]?, ifu=k(m—-1)>4 (33) 
o (m—1)(kK +1)(1+1/r2)?/2, — otherwise. 


Unlike D,, computation of D2 requires log likelihoods 1;(@) and 1;(09) evaluated 
at the average ML estimates 9 and 9 given the tth completed data set that 
HLM7 does not provide at the time of my writing this chapter. 

The two approaches to testing Hp : Q = Qo against Hy, : Q # Qo (Li et 
al. 1991; Meng and Rubin 1992) are based on the likelihood ratio statistics. 
When Qp involves variance components equal to zero given complete data, the 
likelihood ratio test is known to produce a conservative p-value based on the 
chi-square distribution with k degrees of freedom (Pinheiro and Bates 2000). 
Stram and Lee (1994) suggested use of a mixture of chi-square distributions 


to improve the accuracy of the p-value (Pinheiro and Bates 2000; Verbeke and 
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Molenberghs 2000; Snijders and Bosker 2012). Given incomplete data, the test 
statistic D, yields an approximate range of p-values between one half and twice 
the observed p-value (Li et al. 1991), and the test statistic D2 produces the 
p-value (32) that is more accurate than the corresponding p-value (28) (Meng 
and Rubin 1992; Schafer 1997; Little and Rubin 2002). The next two sections 
show how to analyze a hierarchical linear model given ignorable missing data 


by HLM 7 according to the method explained in this section. 


4 Data Analysis 


This section illustrates how to efficiently analyze hierarchical linear model (12) 
given incompletely observed data from the Programme for International Stu- 
dent Assessment (PISA, OECD 2007). PISA has been collecting hierarchical 
data about 15-year old students attending schools nested within nations every 
three years since the year 2000. The data for analysis consists of 5611 students 
attending 166 schools in the US from the PISA 2006 data collection. Table 
1 summarizes the data. The outcome variable of interest is the mathematics 
achievement score (MATH). PISA imputes each missing score five times to pro- 
vide five sets of plausible mathematics scores. In this section, we analyze the 
first set of plausible mathematics scores summarized in Table 1 as if they were 
completely observed. The next section illustrates an analysis strategy with all 
plausible values. 

To summarize the data for analysis, at level 1, mathematics score (MATH) 
and age (AGE) are completely observed while the highest parental occupation 
status (HISEI), the highest education level of parents in the number of years of 
schooling (PARENTED), family wealth (WEALTH) and first-generation immi- 
grant status (IMMIG1) are missing for 390, 61, 34, and 189 students, respec- 


tively. The 5611 students score 475 points in mathematics and are 190 months 


24 


old on average; the highest occupation status and education level of parents are 
52.46 units and 13.61 years on average, respectively; the average family wealth 
is 0.15 units; and 6% of the students are first-generation immigrants. At level 
2, the student-to-teacher ratio (STRATIO) is missing for 28 schools, or 17% of 
the 166 schools, and the private school indicator (PRIVATE) is missing for 3 
schools. The schools have 15.46 students per teacher on average, and 9% of the 
schools are private (cf. OECD 2007). 

Summary statistics reveal that first-generation immigrants scored 36.08 points 
lower than did other students in mathematics achievement on average. In this 
section, we ask how much of the difference is attributable to the individual and 
school characteristics summarized in Table 1; and, controlling for the individ- 
ual and school characteristics, how first-generation immigrants compare with 
other students in mathematics achievement. The complete-case analysis drops 
1405 students and 28 schools to produce inefficient inferences that may also be 
substantially biased. We compare the complete-case analysis with the efficient 


missing data analysis given incomplete data. 


4.1 Complete-Case Analysis 


Preliminary analysis reveals that the school means of the highest parent ed- 
ucation (PARENTED), indicative of school quality, vary substantially across 
schools with a 95% confidence interval (11.38, 15.82). The effect of the high- 
est parent education may vary randomly across schools of different quality. A 


random-coefficients model to test such a hypothesis is 


Yi; = YotryiSTRATIO + y2PRIVATE + y0AGE + y209HISEI 


+739 PARENTED + y49WEALTH + 591M MIG1 


+up; + us; PARENTED + &j, (34) 
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a special case of the HLM2 (12) where 


Yi; = MATH, Xj; = |[HISEI PARENTED WEALTH IMMIGI, 
Z; =(STRATIO PRIVATE), Ui; = AGE, W; =1, Diy = [1 PARENTED), 


Ye = |Y20 Y30 Y40 50], 2 = |Yo1 Yo)", Yu = 10> Yw = Yoo: Uj = [uoj us] 


T TX 
forr=| - — |. We center HISEI, PARENTED, WEALTH, AGE and 


T0333 
STRATIO around their respective sample means, and carry out the complete- 


case analysis by HLM 7 to produce the ML estimates under the heading “CC” 
in Table 2. The CC analysis analyzed 4206 students attending 138 schools. The 
733 estimate is 19 with the associated variance estimate 6.457, not shown in 


Table 2. The hypothesis test of interest is 
Ao : T33 = 0 against H, : 733 > 0. 


Approximate normality of the ML estimator In(733) for the natural logorithm 
in(-) produces an approximate 95% confidence interval for /n(733) which is trans- 
formed to an approximate 95% confidence interval (9.70, 36.90) for 733. The 
interval far away from zero provides some evidence in support of the alterna- 
tive hypothesis. For the hypothesis test, HLM 7 produces a x? test statistic 
196.47 with 136 degrees of freedom based on 137 schools with enough data 
(Raudenbush and Bryk 2002, chapter 3). The p-value is less than 0.001 to re- 
ject the null hypothesis. Therefore, the CC analysis shows that the effects of 
the highest parent education vary randomly across schools, and that attending 
a private school, age, the highest parental occupation status, the highest parent 
education and family wealth are all positively associated with math achieve- 
ment while student-to-teacher ratio and first-generation immigrant status are 


not significantly associated with the outcome. 
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4.2 Efficient Analysis 


Now, we reanalyze the random coefficients model (34) given incomplete data by 
HLM 7 according to the efficient missing data method explained in section 3. 
The ML estimates based on m = 5 imputations are displayed under the heading 
“Efficient” in Table 2. The Efficient analysis considered 5550 students attending 
166 schools after dropping 61 students with the highest parent education missing 
because the method requires that the covariate having a random coefficient 
be completely observed. The 733 estimate is 19 with the associated variance 
estimate 5.38, not shown in Table 2, that is less than the CC counterpart 6.45? 
above. Consequently, an approximate 95% confidence interval for 733 is (10.93, 
33.11) narrower and farther away from zero than the corresponding CC interval 
(9.70, 36.90). To test Ho : 733 = 0 against H, : 733 > 0, we first note that the 
null hypothesis 733 = 0 implies 793 = 0 so that k = 2. HLM 7 provides multiply 
imputed data sets. Given the tth completed data set based on the full model, we 
fit both the full and reduced models to obtain the likelihood ratio test statistic 
d, and the average d to compute D, in Equation (25). The average relative 
increase in variance due to missing data in Q = [793 733]/ is r; = 0.003 ~ 0 
to yield the test statistic D, ~ d/2. Consequently, 2D, ~ 50.75 gives the p- 
value P(x3 > 50.75) < 0.00001 based on Equation (29). This method provides 
the range of the p-value between one half and twice the computed one (Li et 
al. 1991). This precision gives enough evidence to reject the null hypothesis 
in support of the alternative hypothesis that the effects of the highest parent 
education vary randomly across schools. 

Parameter estimates with the associated standard errors, degrees of freedom 
and p-values are shown under “Efficient” in Table 2. The Efficient analysis 
shows that attending a private school, age, the highest parental occupation sta- 


tus, the highest parent education and family wealth are all positively associated 
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with mathematics achievement as the CC analysis revealed. Controlling for the 
individual and school characteristics, however, the first-generation immigrant 
status is negatively associated with the outcome while the association is not 
statistically significant according to the CC analysis. A main reason for the dif- 
ferent inferences is the lower standard error 4.61 of the Efficient analysis than the 
CC counterpart 5.27. Based on m = 5 imputations, the relative increase in vari- 
ance due to missing data in the effect estimate is r = (,/1191/4 — 1)~! = 0.06 
based on the Equation (23). The fraction of missing information s in Equa- 
tion (24) is lower than r = 0.06 so that the standard error 4.61 is at most 
/1 + 0.06/5 = 1.006 times as high as the ideal one based on infinitely many 
imputations (Rubin 1987, p.114; Schafer 1997, p.107). Consequently, the ef- 
fect estimate based on m = 5 imputations loses little precision, relative to the 
counterpart based on infinitely many imputations. That is, five imputations 
provide enough precision for estimation of the effect. Overall, the standard er- 
rors associated with the effect estimates of level-1 covariates under the Efficient 
analysis are up to 14% lower than the CC counterparts. In addition, the effect 
estimates 0.90 and 9.39 of age and family wealth under the Efficient analysis 
are considerably higher than their CC counterparts 0.64 and 6.48, respectively. 
Furthermore, the CC analysis seems to exaggerate the goodness of fit by pro- 
ducing smaller variance estimates than those of the Efficient analysis. At level 
2, the CC analysis produces a lower standard error associated with the effect 
estimate of the private school indicator than does the Efficient analysis. The 
relatively understated CC standard error reflects its positive association with 
the comparatively underestimated CC variance components. 

Based on the Efficient analysis, a typical non-immigrant student attending 
a public school with average age, highest parental occupation status, highest 


parent education, and family wealth scores 471 points in mathematics achieve- 
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ment on average. Students attending a private school score 34.51 points higher 
than do those attending a public school on average, controlling for the effects 
of other covariates in the model. One month older in age, a unit increase in the 
highest parental occupation status, one year increment in the highest parent 
education and a unit increase in family wealth are expected to raise mathe- 
matics scores by 0.90, 0.86, 4.18, and 9.39 points, respectively, ceteris paribus. 
Controlling for the individual and school characteristics in the model (34), the 
average difference in mathematics achievement between first-generation immi- 
grants and other students reduces to 10.81 points or 30% of the initial gap, 
36.08 points. Consequently, the individual and school characteristics considered 
in the model (34) explain 70% of the initial gap in mathematics achievement 


between first-generation immigrants and other students. 


5 Data Analysis with Plausible Values 


The Efficient analysis of HLM2 (34) in the previous section considers the first 
set of plausible mathematics scores as if they were completely observed. Con- 
sequently, the m(= 5) imputed or completed data sets reflect uncertainty due 
to the missing values of covariates, but does not take into account uncertainty 
from missing outcome values to produce understated standard errors of esti- 
mates. This section illustrates an efficient analysis strategy for the model (34) 
using all five sets of plausible mathematics scores. The first set has mean (stan- 
dard deviation) equal to 475.18 (89.87) shown in Table 1. The second to the fifth 
sets have means (standard deviations) equal to 474.44 (89.09), 474.46 (88.95), 
474.97 (88.66), and 474.54 (89.41), respectively. 

In the previous section, we produced m imputations to efficiently analyze 
the desired HLM2 (34) with covariates subject to missingness given the first 


set of plausible outcome values. In this section, we repeat the same analysis 
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with identical covariates given each set of plausible outcome values. With the 
number of qg sets of plausible outcome values fixed at 5, this strategy produces 
5m(= q x m) completed data sets. Unlike the efficient analysis of the previous 
section based on the first set of plausible outcome values, the 5m imputations 
reflect uncertainty in parameter estimates due to missing values of both outcome 
and covariates. Note that we may obtain more imputations by increasing the 
number of m imputations per set of plausible outcome values. It is important 
flexibility to be able to increase m that will decrease the relative increase in 
variance due to missing data in Equation (20) and, thus, increase the degrees of 
freedom of an estimate in Equation (19), in particular, when the missing values 
of covariates account for a considerable amount of uncertainty in estimation. 
The degrees of freedom of an estimate is negatively associated with the p-value. 
This flexibility is not available to us for the outcome variable because the number 
of q sets of plausible outcome values is fixed at 5 by the imputer. Because the 
efficient analysis in the previous section reveals that uncertainty in estimation 
due to the missing values of covariates is not substantial, we generate m = 1 
imputation per set of plausible outcome values to analyze 5 imputations in 
this section. Then we use the “Multiple Imputation” option of HIM 7 that 
automates the complete-data analysis of the hierarchical model (34) given the 
multiple imputation to produce the combined estimates (Rubin 1987). 

The ML estimates are displayed under the heading “Efficient PV” in Table 
2. We compare the Efficient PV analysis with the Efficient analysis based on the 
first set of plausible values in the previous section. Again based on 5550 students 
attending 166 schools, the estimated 733 is 14, lower than 19 produced under the 
Efficient analysis. The associated variance estimate is 6.097, higher than 5.38 
under the Efficient analysis, to reflect added uncertainty due to missing outcome 


values. An approximate 95% confidence interval for 733 is (5.97,32.85) wider and 
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closer to zero than the Efficient analysis counterpart (10.93, 33.11). For testing 
Ho : 733 = 0 versus Hy : T33 > 0, the average relative increase in variance due 
to missing data in Q = [73 T33|7 is r1 = 0.61 based on Equation (26) to yield 
the test statistic D,; = 13.73. Higher than the corresponding 71 = 0.003 based 
on the first set of plausible values in the previous section, the r; = 0.61 implies 
that missing outcome values add a considerable amount of uncertainty to the 
Q estimates. The p-value (28) is P(F2is > 13.73) = 0.0002 to reject the null 
hypothesis in support of the alternative hypothesis that 733 > 0. 

Both Efficient PV analysis and Efficient analysis produce comparable effect 
estimates and the same statistical inferences. However, the Efficient PV analysis 
yields comparatively low degrees of freedom overall to reveal added uncertainty 
in the estimates due to the missing values of the outcome variable. In particular, 
the degrees of freedom for the effect estimate of the first-generation immigrant 
status reduce from 1191 under the Efficient analysis to 66 under the Efficient 
PV analysis. The 66 degrees of freedom translate into the relative increase in 
variance due to missing data r = (66/4 — 1) = 0.33 based on Equation 
(23), which is a substantial increase from the corresponding r = 0.06 under the 
Efficient analysis. The relative increase in r implies that added uncertainty due 
to missing outcome values is considerable, thereby inflating the standard error 
4.61 and the p-value 0.019 under the Efficient analysis to 5.08 and 0.045 under 


the Efficient PV analysis, respectively. 


6 Extensions and Limitations 


This chapter explained how to efficiently analyze a two-level hierarchical lin- 
ear model where explanatory as well as outcome variables may be subject to 
missingness with a general missing pattern at any of the levels (Shin and Rau- 


denbush 2007). The key idea is to reexpress the desired hierarchical model as 
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the joint distribution of variables, including the outcome, that are subject to 
missingness conditional on all of the covariates that are completely observed 
under multivariate normality; estimate the joint distribution by ML; generate 
multiple imputation given the ML estimates of the joint distribution; analyze 
the desired hierarchical model by complete data analysis given the multiple im- 
putation; and then combine the multiple hierarchical model estimates (Rubin 
1987). Given the desired hierarchical model specified by a data analyst, the rest 
of the analysis steps can be automated for efficient estimation of the hierarchi- 
cal model. The automation has been implemented in a software package HLM 
7 that is yet to be released to the public at the time of writing this chapter. 
With such a software package in hand, multilevel incomplete data analysis is no 
different from complete data analysis from the data analyst’s perspective. 

This chapter illustrated two examples for efficient analysis of incompletely 
observed PISA data with HLM 7 The outcome variable was a mathematics 
achievement score subject to missingness. PISA imputed five sets of plausible 
values for each missing outcome value. Assuming the first set of plausible math- 
ematics scores completely observed in the first example, we efficiently analyzed 
hierarchical linear model (34) given covariates subject to missingness at multi- 
ple levels. We compared the efficient analysis with the complete-case analysis. 
Overall, the complete-case analysis produced higher standard errors than did 
the efficient analysis, and some estimates of the complete-case analysis were con- 
siderably different from the counterparts of the efficient analysis. Consequently, 
the two analyses produced different statistical inferences for the effect of a key 
covariate. The second example was analysis of the same model (34) with co- 
variates subject to missingness and all plausible outcome values. We repeated 
the efficient analysis of the first example with identical covariates given each 


set of plausible outcome values, each time imputing a single completed data set 
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to eventually produce as many completed data sets as the five sets of plausible 
outcome values for subsequent complete-data analysis. The combined estimates 
were comparable to and produced the same statistical inferences as those of the 
efficient analysis in the first example. On the other hand, the degrees of freedom 
for estimates were considerably lower than those of the first example, overall. 
That is, the estimates exhibited higher relative increase in variance due to miss- 
ing data than did those of the first example based on the first set of plausible 
values. Consequently, a substantial amount of uncertainty due to missing data 
in estimation was from missing outcome values. 

The second example illustrates one of the difficulties in incomplete data 
analysis when the imputer of the plausible outcome values is different from 
the data analyst of the desired hierarchical model (34) (Meng 1994; Rubin 
1996). With g = 5 sets of plausible outcome values and covariates subject 
to missingness at multiple levels, the analysis based on 5 imputations (m = 
1 imputation per set of plausible outcome values) reveals that a considerable 
amount of uncertainty due to missing data in estimation is from missing outcome 
values. Consequently, the data analyst may want to increase the number of q 
imputations of plausible outcome values, which may increase the degrees of 
freedom and reduce the computed p-value because the degrees of freedom (19) 
of an estimate is positively associated with the number of imputations. However, 
with access to 5 sets of plausible outcome values only, she is unable to reduce 
the relative increase in variance due to missing outcome values. Increasing the 
number of m imputations per set of plausible outcome values may help reduce 
the relative increase in variance due to the missing values of covariates, not due 
to missing outcome values. When both outcome and covariates contain missing 
values, the efficient missing data method and its software introduced in this 


chapter provide flexibility to manipulate the number of imputations. In that 
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case, the data analyst is able to set the number of imputations and produce 
multiple imputation tailored to the analyst’s own analysis with the software. 
Because the analyst is also the imputer, the analyst will not face the difficulty 
that arises when he or she is not the imputer (Meng 1994; Rubin 1996). 

The two-level missing data approach in this chapter has been extended to 
efficient analysis of a two-level contextual-effects model (Raudenbush and Bryk 
2002; Shin and Raudenbush 2010); of a three-level hierarchical linear model 
(Shin and Raudenbush 2011; Shin 2012) and of an arbitrary Q-level hierarchical 
linear model (Shin and Raudenbush 2013) given incomplete data. Three-level 
user-friendly software based on the missing data methods of Shin and Rauden- 
bush (2011) and Shin (2012) is under development at the time of writing this 
chapter. These advances guide us with continuous variables subject to missing- 
ness. 

The efficient analysis of HLM2 (34) in Table 2 involves discrete first-generation 
immigrant status and private school indicator subject to missingness at levels 1 
and 2, respectively. Although it is not appropriate to handle the discrete miss- 
ing data under the corresponding multivariate joint normal distribution (13) of 
variables subject to missingness, the implied conditional model is the desired 
hierarchical model (34). Furthermore, the joint model assumption (13) to han- 
dle missing data affects only imputed data, not observed data. The advantage 
is that the hierarchical model is analyzed by the efficient missing data method 
(Schafer 1997; Shin and Raudenbush 2007). However, with the missing rate 
high, the normal joint model assumption becomes nontrivial. A useful future 
extension of this approach is to efficient and robust handling of normal and 
nonnormal multilevel missing data. 

It entails challenges to extend the missing data method introduced in this 


chapter to analysis of multilevel incomplete data from multistage sampling 
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where different selection probabilities of units are used (OECD 2007; Tourangeau 
el al. 2009). The extent to which complicated sampling weights affect the miss- 
ing data analysis is not well known. To minimize possible adverse impact such 
as biased inferences, the sampling weights may be applied at the final stage of 
complete-data analysis given multiple imputation (Graubard and Korn 1996; 
Pfeffermann et al. 1998; Korn and Graubard 2003). An important future re- 
search topic is extension of the efficient missing data method to analysis of 
multilevel incomplete data generated from multistage sampling with different 
selection probabilities of units. 

Another limitation of the missing data method is that the covariates having 
random coefficients must be completely observed. With such covariates subject 
to missingness, the joint distribution of variables subject to missingness may 
not be expressed as a multivariate normal distribution. Consequently, subse- 
quent analysis given the estimated normal joint model by ML does not apply. 
Research is under way to relax the assumption, which will broaden the applica- 


bility of this method. 
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in the number of years of schooling 

WEALTH family wealth 0.15 ( 0.80) 34 
IMMIG1 1 if 1st-generation immigrant 0.06 ( 0.24) 189 
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